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Executive  Summary 

In  the  last  decade  the  computational  power  of  discrete  optimization  methodology  has  increased 
remarkably  to  the  point  where  problems  that  could  not  be  solved  with  days  of  computation  ean 
now  be  solved  in  minutes  by  commercial  solvers  such  as  CPLEX  and  XPRESS.  This  success  has 
stimulated  the  need  for  methodology  to  solve  even  much  larger  problems  and  the  desire  to  solve 
problems  in  real-time.  We  have  eondueted  researeh  that  has  yielded  computationally  effective 
algorithms  to  provide  high-quality  solutions  to  very  large-scale  planning  problems  and  high-quality 
solutions  in  (nearly)  real-time  to  operational  problems. 

Traditionally,  this  goal  has  been  pursued  with  heuristie  approaches.  The  underlying  theme 
of  our  research  has  been  different.  We  use  exact  optimization  whenever  possible.  For  example, 
to  solve  problems  that  are  too  large  for  exact  optimization,  we  embed  exact  optimization  in  a 
heuristic  search  algorithm.  To  solve  real-time  instances  that  differ  only  by  small  perturbations  of 
coefficients,  we  use  exact  optimization  as  a  planning  tool  whereby  we  solve  a  eore  instanee  exactly 
and  use  results  from  its  solution  to  make  the  solution  of  the  real-time  instances  much  faster  than 
would  be  the  ease  if  they  were  solved  from  scratch.  Furthermore,  we  revisit  the  least  studied  and 
least  understood  aspeet  of  integer  programming  methodology,  namely  branching,  and  develop  novel 
brandling  strategies  based  on  learning  concepts  and  restart  ideas. 

The  faculty  involved  in  this  research  are  Professor  George  Nemhauser  (PI)  and  Professor  Martin 
Savelsbergh  (eo-PI).  The  students  involved  are  Michael  Hewitt,  Fatma  Kilinc-Karzan,  and  Juan 
Pablo  Vielma.  Hewitt  is  expected  to  complete  his  PhD  thesis  entitled  Combining  Exact  and 
Heuristic  Methods  for  Discrete  Optimization  this  summer.  Vielma  will  also  complete  this  summer 
and  his  thesis  is  entitled  Mixed  Integer  Programming  Approaches  for  Nonlinear  and  Stochastic 
Programming.  Papers  submitted  for  publication  so  far  are: 

M.  Hewitt,  G.L.  Nemhauser,  and  M.W.P.  Savelsbergh.  “Combining  Exaet  and  Heuristic  Ap¬ 
proaches  for  the  Capacitated  Fixed  Charge  Network  Flow  Problem.”  Submitted  to  INFORMS  J. 
on  Computing  (2008). 

F.  Kiling-Karzan,  A.  Toreillo,  S.  Ahmed,  G.L.  Nemhauser,  M.W.P.  Savelsbergh.  “Approximating 
the  Stability  Region  for  Binary  Mixed-Integer  Programs.”  Submitted  to  Operations  Research 
Letters  (2008). 

F.  Kilmg-Karzan,  G.L.  Nemhauser,  M.W.P.  Savelsbergh.  "Information  Based  Branching  Rules  for 
Binary  Mixed  Integer  Problems.”  Submitted  to  INFORMS  J.  on  Computing  (2009). 
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1  Embedding  Exact  Optimization  in  Heuristic  Search 


Despite  the  impressive  gains  that  have  occurred  in  mixed-integer  programming  (MIP)  solvers  in  the 
last  decade,  there  are  still  many  significant  practical  problems  that  cannot  be  solved  to  optimality 
using  the  leading  commercial  solvers  such  as  CPLEX  and  XPRESS.  These  problems  are  either 
too  large  or  too  complex  for  reasons  such  as  difficulty  in  finding  a  feasible  solution  or  a  large  gap 
between  the  values  of  the  linear  programming  (LP)  relaxations  and  MIP  optimal  solutions.  In  such 
situations,  a  pragmatic  alternative  is  to  search  for  provably  good  solutions. 

One  of  our  research  thrusts  has  been  to  use  neighborhood  search  to  obtain  nearly  optimal 
solutions  to  huge  and  complex  problems.  The  basic  idea  is  very  simple.  A  subproblcm  involving  a 
suitably  small  subset  of  the  variables  is  selected  and  all  other  variables  are  fixed.  This  subproblcm 
is  solved  by  a  MIP  solver.  Hopefully  a  better  solution  to  the  whole  problem  is  obtained.  Then  we 
repeat  the  process  by  choosing  a  different  subproblem  (see  Algorithm  1). 


Algorithm  1  Neighborhood  Search 

while  the  search  time  has  not  exceeded  a  prespecified  limit  T  do 
Choose  a  subset  of  variables  V 
Solve  the  IP  defined  by  variables  in  V 
if  an  improved  solution  is  found  then 
Update  the  global  solution 
end  if 
end  while 


The  key  to  making  this  approach  work  is  problem  dependent.  We  illustrate  this  approach  on 
the  Fixed  Charge  Network  Flow  (FCNF)  problem,  which  is  a  classic  discrete  optimization  problem 
in  which  a  set  of  commodities  has  to  be  routed  through  a  directed  network.  Each  commodity  has 
an  origin,  a  destination,  and  a  quantity.  Each  network  arc  has  a  capacity.  There  is  a  fixed  cost 
associated  with  using  an  arc  and  a  variable  cost  that  depends  on  the  quantity  routed  along  the 
arc.  The  objective  is  to  minimize  the  total  cost.  Two  versions  of  the  problem  arc  considered: 
commodities  have  to  be  routed  along  a  single  path  and  commodities  can  be  routed  along  multiple 
paths. 

By  selecting  a  suitably  small  subset  of  the  commodities  and  assuming  that  the  flow  paths  for 
all  other  commodities  are  fixed  a  tractable  integer  program  can  be  defined  and  solved  using  an 
IP  solver.  Similarly,  by  selecting  a  suitably  small  subnetwork  and  assuming  all  flows  outside  the 
subnetwork  are  fixed  a  tractable  integer  program  can  be  defined  and  solved  using  an  IP  solver. 
Hopefully,  a  better  solution  to  the  whole  problem  is  obtained  by  these  subproblem  solves.  The 
process  can  be  repeated  multiple  times  by  choosing  different  subsets  of  commodities  and  different 
subnetworks. 

Both  the  selection  of  a  subset  of  commodities  and  the  selection  of  a  subnetwork  define  a  subset 
of  network  arcs.  If  this  subset  of  network  arcs  contains  all  the  arcs  that  carry  flow  in  the  current 
best  solution,  then  we  can  seed  the  IP  with  that  solution  and  ensure  that  we  only  find  solutions 
that  arc  at  least  as  good  as  the  current  best.  Therefore,  all  our  schemes  for  defining  a  smaller  IP 
start  from  the  subset  of  arcs  associated  with  the  current  best  solution.  Also,  our  schemes  do  not 
choose  arcs  directly  but  choose  paths,  thus  ensuring  that  every  arc  chosen  exists  in  sonic  feasible 
solution.  Examples  of  commodity  subset  selection  ideas  include: 

•  Select  commodities  whose  paths  in  the  current  best  solution  use  arcs  for  which  the  reduced 
cost  in  the  most  recent  LP  solution  are  far  from  0.  Thus,  we  try  to  re-route  commodities 
away  from  arcs  that  are  far  away  from  satisfying  complementary  slackness. 
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•  Suppose  in  the  current  best  solution,  there  is  a  node  with  two  commodities  entering  on  a 
single  arc,  but  leaving  on  two  different  arcs.  When  these  arcs  are  not  fully  used,  the  two 
commodities  are  good  candidates  for  re-optimization. 

•  Since  we  want  to  find  sets  of  commodities  that  are  likely  to  share  arcs,  we  search  for  com¬ 
modities  whose  paths  in  the  current  best  solution  are  close  together. 

•  Suppose  the  network  is  very  large  and  there  are  two  commodities  and  that  have  only  one 
feasible  path  from  their  source  to  their  sink  In  that  case,  it  is  unlikely  that  these  paths  have 
a  common  arc  hence  re-optimizing  commodities  the  two  commodities  together  is  not  likely  to 
lead  to  improvements.  Conversely,  if  the  two  commodities  have  many  paths,  then  they  are 
likely  to  share  arcs  and  thus  are  good  candidates  to  optimize  together. 

Examples  of  subnetwork  selection  ideas  include  (they  refer  to  to  the  path-based  formulation  we 
maintain  to  get  dual  bounds): 

•  Select  paths  that  often  appear  in  improving  solutions,  but  are  not  in  our  current  best  solution. 

•  Select  paths  that  appear  in  the  optimal  solution  to  the  LP  relaxation  of  the  path-based 
formulation. 

•  Select  the  path  produced  for  each  commodity  by  the  pricing  problem  associated  with  the 
path-based  formulation. 

Note  that  the  approach  outlined  above  differs  in  two  respects  from  more  traditional  neighbor¬ 
hood  search  methods:  (1)  the  neighborhood  is  explored  using  integer  programming  technology,  and 
(2)  the  neighborhood  selection  is  guided  by  appropriately  chosen  problem  dependent  metrics  and 
changes  as  the  search  progresses.  Note  also  that  this  approach  can  be  used  on  extremely  large 
instances  because  the  algorithm  never  requires  the  full  instance  to  be  in  memory. 

The  resulting  solution  approach  is  very  effective.  For  instances  with  500  nodes,  with  2000,  2500 
and  3000  arcs,  and  with  50,  100,  150,  and  200  commodities,  we  compared  the  quality  of  the  solution 
produced  by  our  solution  approach  with  the  best  solution  found  by  CPLEX  after  15  minutes  of 
computation  and  after  12  hours  of  computation.  On  average,  the  solution  we  found  in  less  than  15 
minutes  is  35%  better  than  CPLEX'  best  solution  after  15  minutes  and  20%  better  than  CPLEX' 
best  solution  after  12  hours.  Furthermore,  we  find  a  better  solution  than  CPLEX’  best  solution 
after  15  minutes  within  1  minute,  and  CPLEX'  best  solution  after  12  hours  within  3  minutes.  On 
these  instances  the  approach  produces  dual  bounds  that  are  25%  stronger  than  the  LP  relaxat  ion. 
We  also  compared  the  quality  of  the  solutions  produced  by  our  solution  approach  with  the  quality 
of  the  solutions  produced  by  a  recent  implementation  of  the  tabu  search  algorithm  of  Ghamlouche 
et  ah  For  nearly  all  instances  in  their  test  set,  our  solution  is  better  than  the  solution  of  the  tabu 
search  algorithm  and  this  solution  is  found  much  faster. 

Table  1  presents  details  of  one  of  our  experiments.  It  shows  the  value  of  the  best  solution  found 
bv  CPLEX  in  15  minutes  and  24  hours,  the  value  of  the  solution  found  by  our  heuristic  search,  and 
the  difference  in  quality  between  the  solutions  found  by  CPLEX  and  the  one  found  by  our  heuristic 
search.  An  “X”  in  a  column  indicates  that  no  feasible  solution  was  found.  (The  value  reported  for 
our  heuristic  search  is  the  value  of  the  solution  produced  within  15  minutes.) 

We  observe  that  in  every  instance  IP  Search  finds  a  better  solution  in  15  minutes  than  CPLEX 
does  in  12  hours.  We  also  see  that  the  improvement  over  the  solution  found  by  CPLEX  in  15 
minutes  is  significant,  often  greater  than  30%.  Even  the  improvement  over  the  solution  found  by- 
CP  LEX  in  12  hours  is  impressive,  often  greater  than  20%.  Unfortunately,  little  can  be  said  with 
confidence  regarding  the  true  optimality  gap  of  the  solutions  produced  by  IP  Search  since  the  dual 
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Table  1:  Primal-side  Comparison  with  CPLEX 


Problem 

CPLEX-  15M 

CPLEX-12H 

IP  Search 

CPLEX- 15M 
Gap 

CPLEX-  12H 
Gap 

500,2000,50,  F,L 

5,301,081 

5,301,081 

3,910.120 

-35.57 

-35.57 

500, 2000, 50,F,T 

X 

7,927,065 

5,249,040 

N/A 

-51.02 

500,2000, 100, F,L 

8,944,724 

8,299,799 

6,764,310 

-32.23 

-22.70 

500,2000, 100, F,T 

10,199,000 

8,306,181 

7,718,750 

-32.13 

-7.61 

500,2000, 150, F,L 

10,996,000 

10,080,000 

8,618,060 

-27.59 

-16.96 

500,2000, 150, F,T 

12,115,000 

10,770,000 

9,448,890 

-28.22 

-13.98 

500, 2000, 200, F,L 

13,808,000 

12,824,000 

10,333,200 

-33.63 

-24.10 

500, 2000, 200, F,T 

X 

X 

12,425,600 

N/A 

N/A 

500, 2500, 50.F,L 

4,611,275 

4,611,275 

3,841,350 

-20.04 

-20.04 

500, 2500, 50, F,T 

5,779,926 

5,084,529 

4,666,740 

-23.85 

-8.95 

500,2500, 100,  F,L 

9,351,042 

9,251,042 

6,875,420 

-36.01 

-34.55 

500,2500, 100, F,T 

9,724,997 

7,995,284 

7,235,520 

-34.41 

-10.50 

500,2500, 150, F,L 

13,660,000 

12,497,000 

9,730,100 

-40.39 

-28.44 

500,2500, 150, F,T 

11,385,000 

10,683,000 

7,934,360 

-43.49 

-34.64 

500, 2500, 200,  F,L 

15,539,000 

13,468.000 

11,261,300 

-37.99 

-19.60 

500,2500,200,  F,T 

18,906,000 

14,948.000 

12.825,300 

-47.41 

-16.55 

500, 3000,50, F,L 

5,098,318 

5,098,318 

3,596,980 

-41.74 

-41.74 

500,3000,50.  F,T 

5,615,096 

4,866,768 

4,504.260 

-24.66 

-8.05 

500,3000, 100, F,L 

8,721,798 

8,721,798 

6,577,980 

-32.59 

-32.59 

500,3000, 100,  F,T 

10,119,000 

8,330,109 

7,517,970 

-34.60 

-10.80 

500,3000, 150, F,L 

12,628,000 

12.623,000 

9,214,960 

-37.04 

-36.98 

500,3000, 150,  F,T 

12,615,000 

10.147,000 

9,186,840 

-37.32 

-10.45 

500, 3000,200,  F,L 

15,039,000 

13,441,000 

10,853,400 

-38.56 

-23.84 

500,3000,200, F,T 

17,883,000 

13,674,000 

11,578.000 

-54.46 

-18.10 

Average 

-35.18 

-22.95 

bounds  produced  by  CPLEX  change  very  little  over  the  course  of  the  execution  and  are  likely  to 
be  weak.  In  fact,  for  many  of  the  loosely  capacitated  instances,  CPLEX  did  not  find  a  significant ly 
better  primal  solution  in  12  hours  than  it  did  in  15  minutes.  This  highlights  the  difficulty  that  an 
LP- based  branch-and-bound  algorithm  can  have  in  finding  good  primal  solutions  when  the  dual 
bounds  are  weak. 

For  further  details  on  the  approach  and  additional  computational  results  see: 

M.  Hewitt,  G.L.  Nemhauser,  and  M.W.P.  Savelsbergh.  “Combining  Exact  and  Heuristic  Ap¬ 
proaches  for  the  Capacitated  Fixed  Charge  Network  Flow  Problem.”  Submitted  to  INFORMS  J. 
on  Computing  (2008). 

2  Approximating  the  Stability  Region  for  Binary  Mixed-Integer 
Programs 

Suppose  we  have  a  difficult  discrete  optimization  problem  with  some  binary  variables  and  a  linear 
objective  function  for  these  variables.  In  addition  to  finding  an  optimal  solution,  we  would  like  to 
know  how  much  the  objective  vector  can  change  while  maintaining  the  optimality  of  the  solution. 
This  stability  information  is  useful,  for  example,  when  solving  problems  with  perturbed  objective 
functions  rapidly. 

We  consider  the  optimization  problem 


2*  =  max  c*x  -f  h(y) 
s.t.  (x,  y)  €  X 
x  €  {0, l}n, 
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(■ P{c *)) 


where  c*x  =  c*Xi,  N  —  {1, . . .  ,  n}  and  (x*,y*)  is  an  optimal  solution.  We  are  interested  in 

the  stability  of  (x*,  y*)  with  respect  to  variations  of  the  cost  vector  c*. 

Definition  2.1.  The  stability  region  of  (x*,y*)  is  the  region  C  C  Rn  s.t.  c  E  C  if  and  only  if 
(x*,y*)  is  optimal  for  P{c).  That  is,  C  =  {c  E  Rn  :  c(x  —  x*)  <  h(y*)  —  h(y),V  (x,  y)  E  X}. 

By  possibly  complementing  variables,  we  assume  without  loss  of  generality  that  x*  =  0,  so  the 
optimal  value  of  P  is  z*  =  h(y*). 

Remark  2.1.  Let  (x,  y)  be  feasible  for  P,  with  objective  value  z  =  c*x  +  h(y)  <  z* .  Suppose  c.  E  Rn 
satisfies  cx  >  z*  —  z  +  c*x.  Then  c  $  C . 

Remark  2.2.  Let  x  €  {0,  l}n,  and  define  u(£)  =  max^  y  Wlth  v(x)  —  — oc  if  no  y  satisfies 
(x,  y)  €  X.  Then  C={ceRn:cx<  h{y*)  -  v(x),  V  x  €  {0,  l}n}. 

Remark  2.2  implies  that  C  is  a  polyhedron  possibly  defined  by  an  exponential  number  of  in¬ 
equalities.  We  choose  to  approximate  C  using  polyhedra  defined  by  polynomially  many  inequalities. 
The  next  two  definitions  further  explain  this  approximation. 

Definition  2.2.  Let  C  be  the  stability  region  of  (x*,  y *).  An  outer  approximation  of  C  is  a  region 
C+  C  Rn  satisfying  C+  Q  C.  An  inner  approximation  of  C  is  a  region  C~  C  Rn  satisfying  C~  C  C. 


The  following  are  two  simple  but  important  consequences  of  Definition  2.2  that  help  us  obtain 
small  outer  approximations  and  large  inner  approximations. 

Proposition  2.3. 

i)  IfC^Cf  are  outer  approximations ,  is  an  outer  approximation, 

ii)  If  Cf  are  inner  approximations ,  co nv(C{"  U  Cfj)  is  an  inner  approximation. 

Next,  we  show  how  to  obtain  inner  and  outer  approximations  of  C  by  solving  n  problems 
{Pj  where  each  Pj  is  given  by 

Zj  =  max  c* x  +  h(y) 

s.t.  (*,y)€X  (P3) 

X  e  {0,1 }n 

Xj  =  1. 

Throughout,  we  assume  without  loss  of  generality  that  all  problems  Pj  are  feasible.  Accordingly, 
we  assume  that  we  have  solved  the  problems  Pj  for  every  j  E  N,  and  determined  optimal  solutions 
(xJ  ,  yi)  with  corresponding  objective  values  Zj. 

The  following  observation  follows  directly  from  Remark  2.1  and  Proposition  2.3. 

Proposition  2.1.  The  set  C+  =  {c  E  Rn  :  cxJ  <  z*  —  Zj  4-  c*xJ,  V  j  E  Nj  is  an  outer  approxima¬ 
tion  of  C . 

Let  7j  =  z*  —  Zj  +  c*.  Observe  that  the  outer  approximation  of  Proposition  2.1  satisfies 

{c  €  C+  :  c  >  c*}  c  {c  €  Rn  :  c*  <  Cj  <  7j,V  j  €  N}.  (1) 

In  other  words,  in  the  direction  c  >  c*,  the  stability  region  C  of  (x*,y*)  is  contained  in  a  box 
defined  by  the  values  7 j. 

To  determine  an  inner  approximation,  we  make  use  of  the  next  result. 
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Algorithm  2  Binary  Solution  Cover 


Require:  An  optimization  problem  P  with  optimal  solution  ( x*,y *)  satisfying  x*  =  0. 


Set  (x°,y°)  <—  (x*,y*),  z0^z\I^N. 
Add  cut  D  =  {J2ieixi  ^  1)  to  P* 


for  k  =  1 , . . . ,  n  do 


Resolve  the  modified  P;  get  new  optimal  solution  (xk,  yk )  and  objective  value  c*xkjrh(yk)  =  z 


if  P  is  infeasible  then 
Set  loo  <—  /. 

Return  A:  and  exit. 

end  if 

Sct/+^{i€iV:*f  =  l},/fc-«-/n/+ 

Set  /  «—  /  \  ;  modify  cut  D  accordingly. 

end  for 


Proposition  2.2.  Let  cP  =  (c|, . . . ,  . . . ,  c*).  Then  c?  €  C,  V  j  €  A. 

Theorem  2.3.  Suppose  we  order  the  x  variables  so  that  z*  >  z\  >  Z2  >  -  -  •>  zn  holds.  Then 


(2) 


an  inner  approximation  of  C . 

Corollary  2.4.  The  set  {c  +  d  :  c  €  C~ ,d  <  0}  is  an  inner  approximation  of  C . 

These  last  two  results  motivate  a  natural  algorithm  for  determining  an  inner  approximation 
Solve  each  of  the  problems  P3  in  turn,  sort  them  by  objective  value,  and  compute  the  inner  ap¬ 
proximation  as  indicated  in  Theorem  2.3.  This  procedure  can  be  modified  slightly  to  potentially 
reduce  the  number  of  solves. 

Algorithm  2  begins  with  a  problem  of  the  form  P  and  an  optimal  solution  (x*,y*)  with  :r*  =  0. 
It  then  adds  a  cut  xi  >  1,  forcing  at  least  one  of  the  x  variables  to  one.  After  resolving,  it 

determines  which  of  the  x  variables  switched,  removes  their  coefficients  from  the  cut,  and  repeats. 
The  end  result  is  a  list  of  solutions,  ordered  by  objective  value,  which  covers  all  possible  x  variables. 
(Variables  not  covered  by  any  solution  on  the  list  are  those  fixed  to  zero  in  any  feasible  solution.) 
For  future  reference,  we  formally  define  the  relevant  information  gathered  during  the  execution  of 
Algorithm  2  as  follows.  The  solution  in  the  k-th  iteration  is  denoted  by  {xk,yk)  and  has  objective 
function  value  2 The  set  indicates  which  binary  variables  have  value  one  ill  (a;*  yk).  The  set 
Ik  indicates  which  of  these  variables  have  value  one  for  the  first  time. 

An  outer  approximation  is  easily  obtained  applying  Remark  2.1  to  each  solution  (xk,yk).  To 
determine  an  inner  approximation,  we  must  first  establish  a  preliminary  fact. 

Proposition  2.1.  Let  i  E  Ijf ,  for  some  k.  Then  (xk,  yk)  is  optimal  for  Px. 

Note  that  ( xk ,  yk)  is  not  necessarily  optimal  for  Pj,  for  j  €  We  now  combine  Proposition 

2.1  with  Theorem  2.3  to  construct  our  inner  approximation. 

Theorem  2.2.  Suppose  Algorithm  2  is  run  on  problem  P ,  terminating  after  A  <  n  steps.  Let 
(xk,  yk ),  Zk.l^  k  =  l,...J  and  I oo  be  obtained  from  the  algorithm..  Then 
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is  an  inner  approximation  of  C. 

It  is  important  to  note  that  the  stability  region  C  depends  only  on  (x*,  y*)  and  h(y),  and  not  on 
c* .  However,  the  inner  approximation  we  calculate  using  Algorithm  2  does  depend  on  c*,  because 
c*  appears  in  the  right-hand  side  of  each  inequality  and  also  because  different  starting  costs  may 
determine  different  solutions  in  the  algorithm  when  we  add  the  inequality  Yliel Xi  —  So  any 
c*  €  C  can  be  used  in  Algorithm  2  to  obtain  a  possibly  different  inner  approximation. 

We  next  address  the  quality  of  approximation  of  the  inner  and  outer  regions  C~ ,  C+  generated 
by  Algorithm  2.  For  any  c  €  C+  \  C~,  we  are  unable  to  determine  the  optimality  of  (x*,y*)  for 
P(c)  without  re-optimizing.  Ideally,  we  would  like  to  estimate  the  volume  of  this  uncertainty  region 

\  C-,  and  perhaps  compare  it  to  the  volume  of  C~  and  C+.  However,  even  for  problems  with 
modest  dimension  we  cannot  efficiently  estimate  this  volume. 

In  light  of  these  computational  difficulties,  we  have  developed  a  “shooting”  experiment  to  give 
some  idea  of  the  relative  sizes  of  C-,  and  C+  \  C~ .  Starting  from  the  original  objective  vector 
c*,  we  generate  a  random  direction  d  by  uniformly  sampling  each  component  from  [0,  1].  We  then 
compute  the  quantities 

A-  =  max  {c*  +  At €  C~  }  A+  =  max  {c*  +  Att^t  e  C+  }  ,  (4) 

A  \  M  J  A  \  HI  J 

and  use  the  values  to  compare  the  relative  sizes  of  C~  and  in  the  direction  d. 

We  performed  the  shooting  experiment  on  the  problem  instances  contained  in  MIPLIB  3.0. 
For  a  given  instance,  we  considered  as  x  variables  all  binary  variables  that  satisfied  the  following 
conditions: 

•  The  variable  is  not  fixed  to  its  optimal  value  in  all  feasible  solutions.  In  terms  of  Algorithm 
2,  this  means  the  variable  docs  not  belong  to  the  set  1^. 

•  There  is  no  alternate  optimal  solution  with  the  variable  set  to  its  complement.  That  is,  we 
have  >  Zj. 

We  refer  to  such  binary  variables  as  “active.”  If  a  particular  MIPLIB  instance  did  not  have  am 
active  binary  variables,  we  discarded  it.  We  also  skipped  problems  with  more  than  5,000  binary 
variables  and  problems  which  did  not  solve  to  optimality  within  one  hour  of  CPU  time.  All 
computational  experiments  were  carried  out  on  a  system  with  two  2.4  GHz  Xeon  processors  and 
2  GB  RAM,  and  using  CPLEX  11.1  (with  default  settings)  as  the  optimization  engine.  For  each 
instance,  wc  generated  100,000  direction  vectors  d. 

Table  2  contains  average  results  for  our  experiments.  For  each  instance,  we  report  the  total 
number  of  decision  variables  (#  vars),  the  total  number  of  binary  variables  (#  bin),  the  number 
of  active  binary  variables  (#  active),  the  Euclidean  norm  of  the  cost  vector  corresponding  to  the 
active  variables  (||c*||),  the  average  A“  and  A+  values,  and  their  ratio.  For  example,  for  problem 
lseu,  85  of  89  decision  variables  are  active.  In  the  directions  tested,  C~  occupies  87%  of  the  volume 
of  C+,  but  both  regions  allow  only  for  a  relatively  small  change  in  c* ,  since  ||c*||  is  significantly 
larger  than  A~  and  A+. 

As  Table  2  indicates,  the  estimated  volume  ratio  of  C~  and  varies  significantly  from  one 
instance  to  the  next.  On  one  end,  Algorithm  2  delivers  a  fairly  tight  approximation  for  problems 
dcmulti  (92%),  enigma  (95%),  and  lseu  (87%),  to  name  a  few.  In  fact,  Algorithm  2  even  delivers 
the  exact  stability  region  for  p0033.  On  the  other,  the  volume  ratio  is  very  small  on  instances  such 
as  p2756  (7%),  vpml  (3%)  and  vpm2  (14%).  This  discrepancy  is  certainly  in  part  due  to  differences 
in  the  number  of  active  binary  variables  (14  in  enigma  and  2,391  in  p2756,)  since  volume  differences 
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tend  to  grow  as  dimension  grows.  However,  part  of  this  discrepancy  is  also  instance  dependent,  as 
some  problems  with  similar  number  of  active  variables  have  very  different  ratios. 

Overall,  the  experimental  results  indicate  that  the  volume  of  \  C~  is  significant  for  some 
instances. 

Therefore,  we  next  address  the  cause  of  this  discrepancy:  Are  we  under- approximating  C~  or 
over- approximating  C+?  To  answer  this  question,  we  performed  the  following  additional  experi¬ 
ment.  For  each  of  the  first  1000  random  directions  d  that  yield  A”  <  A+,  we  construct  a  new  cost 
vector 


c,  _  ,  A  +  A+  i  ec+\c~ 

*  +  2  \\d\\  '  ’ 

and  re-optimize  the  problem  with  this  new  objective.  We  then  count  the  number  of  times  the 
original  optimal  solution  remains  optimal  for  the  new  cost  vector  c±.  The  next  column  in 

Table  2  (OIO,  for  “original  is  optimal”)  reports  these  counts  for  all  instances  except  p0033.  With 
the  exception  of  dsbmip,  rentacar  and  rgn,  the  original  solution  is  optimal  most  of  the  time,  with 
most  instances  recording  counts  above  800  and  many  getting  1000.  These  results  indicate  that 
the  uncertainty  region  C+  \  C~  is  caused  primarily  by  an  excessive  under-approximation  of  C~ . 
They  also  suggest  that  (x*,y*)  is  likely  to  remain  optimal  inside  C+,  even  when  this  fact  cannot 
be  guaranteed. 
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Tabic  2:  Shooting  experiment  results  for  MIPLIB  3.0  instances. 


For  further  details  on  the  approach  and  computational  experience  with  the  algorithms  see: 

F.  Kilmg-Karzan,  A.  Toreillo,  S.  Ahmed,  G.L.  Nemhauser,  M.W.P.  Savelsbergh.  “Approximating 
the  Stability  Region  for  Binary  Mixed-Integer  Programs.”  Submitted  to  Operations  Research 
Letters  (2008). 

3  Information-based  Branching  for  Integer  Programming 

Branch-and-bound  ( B&B )  empowered  with  advanced  features  such  as  presolve,  cuts,  heuristics  and 
strong  branching  is  the  preferred  algorithm  for  solving  Mixed-Integer  Linear  Programming  (MIP) 
problems.  We  focus  on  branching.  A  subproblem  corresponding  to  a  node  in  a  B&B- tree  is  fathomed 
when  we  are  able  to  certify  that  it  has  been  fully  explored,  that  is,  all  feasible  solutions  to  it  have 
been  explicitly  or  implicitly  visited;  otherwise,  it’s  referred  to  as  an  open  subproblem.  In  a  B&B- 
scheme,  open  subproblems  are  generated  in  a  rigid  way,  depending  on  the  branching  rule.  In  fact, 
every  open  subproblem  is  restricted  to  be  identical  to  a  previously  examined  subproblem  except  for 
its  last  chronologically  branched  variable.  Therefore,  inappropriate  branchings  performed  at  the 
beginning  can  jeopardize  the  effectiveness  of  the  branc  h-and-bound  method. 

Changing  the  policy  for  branching  variable  selection  can  have  a  dramatic  effect  on  the  over¬ 
all  time  needed  to  solve  a  problem.  Most  existing  branching  variable  selection  methods  either 
estimate  the  impact  of  the  candidate  variables  on  the  objective  function  of  the  LP  relaxation  or 
provide  bounds  on  the  degradation.  The  candidate  variable  having  the  greatest  estimated  impact 
is  then  chosen  for  branching.  The  motivation  is  to  select  the  branching  variable  that  maximizes  the 
degradation  of  the  objective  function  value  at  the  optimal  solution  of  the  child  node  LP  relaxation 
which  gives  a  tighter  bound  on  the  unsolved  nodes. 

The  “information-based  variable  selection  methods”  we  have  developed  use  a  variety  of  means 
to  estimate  the  impact  of  each  candidate  variable  on  fathoming  based  on  previously  collected 
information.  These  new  methods  recognize  that  the  branching  decisions  made  at  the  top  of  the  tree 
are  the  most  important  ones  In  order  to  make  more  informed  decisions  at  the  top,  first  a  traditional 
started  and  after  a  certain  amount  of  information  is  collected  from  the  fathomed  nodes,  the 
process  is  halted.  A  restart  of  the  i?£VZ?empowered  by  exploiting  the  information  contained  in  a  set 
of  previously  fathomed  subproblems  is  performed.  In  contrast  to  the  current  branching  variable 
selection  methods  based  on  the  degradation  estimate  in  the  objective  function  value,  we  favor  the 
variables  that  have  the  most  impact  on  the  fathoming  of  the  child  nodes.  The  general  idea  is  to 
arrive  at  child  nodes  which  are  closer  to  being  fathomed,  in  the  hope  that  one  or  both  of  the  child 
nodes  will  never  be  expanded.  Note  that  using  the  information  derived  directly  from  the  previous 
search  nodes,  as  in  backjumping  or  learning  B&B ,  may  be  ineffective,  as  they  have  been  “spoiled” 
by  inappropriate  branchings  earlier  in  the  process.  Instead,  in  our  approach,  we  strengthen  the 
information  on  fathomed  subproblems  by  eliminating  the  unnecessary  branching  decisions  which 
had  no  effect  on  their  fathoming.  In  addition  to  new  branching  methods,  we  use  this  information 
in  propagation  and  in  the  generation  of  valid  inequalities. 

We  consider  the  MIP  problem 

min  |cTx  4-  dTy  \  Ax  +  By  ^  b,  x  e  {0,  l}n,  y  E  R+ j  (P) 

where  c  E  Rn,  d  E  Rk,  b  E  Rm,  A  E  Rmxn  and  B  E  Rmxfc.  Its  LP  relaxation  is  obtained  by 
replacing  x  E  {0, 1  }n  by  0  <  x  <  1. 

Recall  that  in  a  B&B- tree,  or  for  short  a  tree,  a  node  can  be  fathomed  in  three  ways:  (i)  the 
node  LP-relaxation  is  infeasible,  (ii)  the  optimal  solution  to  the  node  LP-relaxation  is  integer,  and 
(iii)  the  optimum  value  of  the  node  LP-relaxation  is  no  better  than  the  objective  value  of  the 
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currently  best  known  integer  feasible  solution.  To  formalize  the  notion  of  fathoming,  we  use  the 
following  notation. 

We  say  /j  is  the  fixing  of  the  variable  j  to  the  value  l  E  {0, 1}.  Let  iVo  denote  the  root  node, 
Ni  denote  a  node  in  the  tree  and  C*  =  Cf  U  C\  be  the  set  of  binary  variables  fixed  at  node  Ar, , 
where  Cf  (Cj)  denotes  the  indiees  of  binary  variables  fixed  to  0  (1).  Without  loss  of  generality,  we 
assume  that  Co  =  0. 

Definition  1.  If  Ni  is  a  fathomed  node  of  a  binary  tree,  then  Cx  is  called  a  clause  corresponding 
to  node  Nx.  When  \CX\  =  n,  the  clause  is  called  trivial. 

Definition  2.  Let  C  be  a  clause.  If  C  \  /j  is  not  a  clause  for  any  /j  E  C,  then  C  is  called  a 
minimal  clause. 

If  C  is  a  minimal  clause,  any  node  Nx  in  the  tree  with  Cx  C  C  eannot  be  fathomed  by  any  of  the 
fathoming  rules.  Moreover  any  node  Nz  with  Cx  D  C  ean  be  fathomed  together  with  the  subtree 
rooted  at  Nx.  Note  that  there  exists  a  minimal  elause  (not  lieeessarily  unique)  associated  with  each 
fathomed  node  in  any  binary  tree. 

Given  a  elause  C  =  C°  U  C1,  the  inequality 

^  Xj  +  -Xj)  >  1  (5) 

i€C°  tec1 

eliminates  all  solutions  that  do  not  eomply  with  it.  Clearly  (5)  might  cut  off  some  feasible  solutions. 
However,  the  region  eut  off  by  (5)  is  guaranteed  not  to  eontain  any  feasible  solution  with  an  objective 
value  better  than  the  optimal  objective  value.  Note  that  (5)  is  a  generalized  cover  inequality  and 
hence  by  obtaining  minimal  clauses,  wc  actually  derive  a  minimal  generalized  cover  inequality  for 

(P). 

Wc  wish  to  efficiently  identify  the  most  useful  clauses  and  use  them  effectively  in  B&Bby 
exploring  the  restart  framework.  Our  approach  is  mainly  based  on  deriving  information  in  the 
form  of  elauses  from  the  fathomed  nodes  of  a  partial  tree.  We  employ  this  information  in  guiding 
the  search  through  designing  advanced  preprocessing,  propagation  and  branching  schemes  as  well 
as  in  generating  valid  inequalities  of  the  form  (5).  It  is  quite  likely  that  the  elauses  obtained  from  a 
partial  tree  are  not  minimal.  So  we  strengthen  the  information  on  fathomed  nodes  by  eliminating 
the  unnecessary  fixings  which  had  no  effeet  on  fathoming.  We  do  this  by  solving  a  MIP  model  that 
obtains  a  minimal  elause  from  a  given  clause. 

Next,  we  present  a  model  that  ean  be  used  to  generate  a  minimal  elause  of  minimum  cardinality. 
Without  loss  of  generality,  we  assume  that  the  upper  and  lower  bounds  of  the  binary  variables  arc 
also  included  in  the  original  formulation  as  constraints.  Let  v*  be  the  objective  value  of  the  optimum 
solution  to  (P).  We  ean  fathom  any  node  of  the  tree  wliieh  is  either  infeasible  or  has  an  objective 
value  greater  than  v* .  Fathoming  by  integrality  is  quite  infrequent  in  praetiee  and  given  v* ,  we  can 
simply  fathom  all  nodes  with  objective  value  greater  than  or  equal  to  u*,  which  includes  fathoming 
of  all  integer  solutions  as  well. 

Consider  a  leaf  node  of  the  tree  that  is  fathomed  by  bound  and  denote  the  corresponding  set 
of  variable  fixings  by  C  =  C°  U  C1.  The  LP  relaxation  of  this  leaf  node  is 

min  cTx  +  d1  y 
s.t.  Ax  +  By  >  b 

xt  >1  Mi  E  C{ 

-xx>0  Mi  E  C° 

x  E  R+,  y  E  R+ 
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Define  the  following  variables: 


o  _  J  1,  if  inequality  —  Xj  >  0  is  added  to  the  LP  relaxation: 

\  0,  otherwise, 

and 

i  f  1,  if  inequality  Xi>  1  is  added  to  the  LP  relaxation; 

\  0,  otherwise. 

Using  these  new  variables,  the  following  MIP 

min  cTx  +  dTy 
s.t.  Ax  -f  By  >  b 

z\xi  >  z\  Vz  G  C1 

-  >0  Vz  G  C° 

a:  G  R" ,  y  G  R+ 

is  equivalent  to  the  node  LP  when  all  of  the  binaries  in  it  are  set  to  1. 

We  want  to  find  a  minimum  number  of  bound  inequalities  such  that  their  inclusion  in  the 
original  linear  programming  relaxation  of  (P)  will  still  lead  to  a  fathoming,  i.e.,  either  the  LP 
relaxation  is  infeasible  or  the  objective  value  exceeds  the  lower  bound  value,  v*.  Since  the  original 
root  LP  relaxation  is  assumed  to  be  feasible  and  bounded  from  below,  its  dual  is  also  feasible.  Thus 
we  know  that  in  the  case  of  infeasibility  of  the  node  LP,  the  dual  of  the  node  LP  is  unbounded 
since  when  we  add  new  bound  inequalities  to  the  primal,  we  are  just  adding  new  columns  to  the 
dual  of  the  node  LP.  Therefore  when  a  node  is  fathomed  by  infeasibility,  we  will  be  able  to  find  a 
dual  solution  with  an  objective  value  strictly  greater  than  v*.  So  now  we  consider  the  dual  of  the 
above  formulation  by  treating  the  variables  z and  z\  as  parameters  and  we  obtain 


max  \rb+  7} z\ 

iec1 

s.t.  XIAl  +  7/2/  <  a 

Vi  €  C1 

Ar^  -  ^  <  c; 

Vi  6  C° 

XTAl  <  d 

Vi  €  {1,.. 

•  >n)  \(c°uc1) 

X1  Bj  <  dj 

Vj€{  1... 

A,  >0 

7?’7i  >0. 

Vi  6  {1,.. 

.,m} 

Since  we  are  only  interested  in  the  existence  of  dual  solutions  with  objective  value  greater  than 
or  equal  to  iA  we  can  equivalently  turn  the  objective  into  a  constraint.  By  considering  zf*  and 
z\  as  binary  variables  with  the  condition  that  at  most  one  of  them  can  be  set  to  1  for  each  i 
(since  C°  PI  C1  =  0,  we  don’t  need  to  include  these  constraints  explicitly),  we  obtain  the  following 
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formulation  where  we  minimize  the  number  number  of  z®  and  z\ : 

min  zi  +  Zi 
ieC°  ieC1 


s.t.  Xrb  +  ^2  7 >  v* 
teC1 


X1  At  +7 jz}  <  ct 
XT  Ai  -  <  Ci 

XT  At  <  Ci 


Vi  €  C1 
Vi  6C° 

Vie  {1, . . . ,  n}  \  (C°  U  C1) 


A 1  Dj  <  dj 
Xt  >0 


,o 

1  ' 


6  {0, 1}, 


Vj€{l, 

Vi  €  {1, ... ,m} 
7? >  7i  >  0. 


Ill  order  to  bound  the  dual  variables  7  and  A  from  above  by  1,  we  introduce  another  variable 
a  and  also  to  linearize  the  model,  we  introduce  u and  u\  for  the  nonlinear  terms  7 fz®  and  7 \z\ 
respectively: 

min  ^  A  +  J2  zt 

ieC°  ieC1 

s.t.  A Tb+  u\  —  av*  >  0 

itC1 


A7  Ax  +  v\  —  QtCi  ^  0 

Vi  g  C1 

\r Ax  —  u®  —  aci  <  0 

Vi  €  C° 

XrAx  —  acx  <  0 

Vi  €  {1,. 

,.,n}\ 

(C°  U  C1) 

A 1  Bj  —  adj  <  0 

Vj€{l,. 

«?  <  7<\  <  *1 

-  7,n  -  2? 

>  -1 

Vi  g  c° 

'’Vf 

VI 

VI 

e 

1 

1 

JN! 

>  -1 

Vi  G  C1 

0  <  A;  <  1 

Vi  €  {1,. 

. . ,  m} 

z^z}  €  {0,1},  0<tif,«],7?,7i  <1 

0  <  a. 


Note  that  the  fixing  of  the  last  variable  on  the  path  from  root  node  to  a  leaf  node  is  the  main 
cause  of  fathoming  done  at  the  leaf  node.  Hence  in  any  feasible  solution  to  the  above  formulation, 
we  will  always  have  the  corresponding  binary  variable  (z®  or  z\)  fixed  to  1.  In  our  computations, 
we  take  advantage  of  this  fact  by  actually  fixing  the  value  of  the  last  branched  variable  in  this  MIP 
and  we  replace  a  >  0  with  a  >  10“6. 

Let  C  =  {Ci,  ...,Ca  }  be  a  set  of  clauses  in  a  partial  tree.  Consider  a  node  of  the  tree  Ar  other 
than  the  root  node,  and  let  C®(CX)  denote  the  set  of  binary  variables  fixed  to  0  (1).  We  say  a 
clause  C  =  C°  U  C1  is  active  at  N  if  C°  fl  C1  —  0  and  C1  fl  C°  =  0,  i.e.  if  it’s  possible  to  obtain  a 
child  node  (not  necessarily  immediate)  from  the  current  node  that  can  be  fathomed  by  the  clause 
C.  Whenever  a  clause  becomes  inactive  for  a  particular  node,  it  will  remain  inactive  for  all  the 
child  nodes  of  that  node.  Since  some  variables  are  already  fixed  at  TV,  the  active  clauses  can  be 
updated  using  this  information,  i.e.,  the  clause  C  =  (C°  \  C°)  U  (C1  \  C1)  indicates  the  possible 
extension  of  the  current  node  to  a  child  node  which  can  be  fathomed  by  clause  C.  In  the  rest  of 
the  text  whenever  we  refer  to  an  active  clause  at  a  node,  we  actually  refer  to  the  updated  version 
of  the  clause. 
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Whenever  an  active  clause  C  lias  only  one  variable,  i,e.,  \C\  =  1,  we  can  immediately  fix  the 
value  of  that  variable.  Suppose  C  =  C°  =  {j},  then  we  can  set  x3  =  1  and  create  only  a  single 
child  node,  as  the  other  branch  will  automatically  be  fathomed  by  the  elause.  We  refer  to  this  as 
propagation. 

Given  a  node  and  a  set  of  aetive  elauses  at  that  node,  there  are  several  ways  to  use  this 
information  in  determining  a  branching  variable.  Let  x  be  the  veetor  of  eurrent  LP  relaxation 
values  of  variables  at  the  node.  We  first  weight  eaeh  aetive  elause,  denoted  by  uj(Ci),  to  estimate 
its  importance  in  fathoming.  We  have  tested  four  different  alternatives  to  estimate  uj(Cz): 

i)  aj(Cl)  =  1  for  all  elauses  C\  (i.e.  all  elauses  are  of  equal  importance), 

ii)  uj(Ci)  =  j^Tj  where  |C*|  is  the  number  of  variables  included  in  elause  C*  (i.e.  short  clauses  are 
preferred), 

in)  u^(Cj)  =  ^  —  j  — ~ ( where  we  look  at  the  possible  closeness  to  violation  of  the 

clause  inequality  (5), 

iv)  uj(Ci)  =  2~^c  ,l  (exponentially  higher  preference  given  to  the  shorter  elauses). 

Then  using  the  weights  of  the  elauses,  we  estimate  the  effectiveness  of  fixing  the  binary  variable 
Xj  to  0  (1),  denoted  by  0®  (/?])•  We  have  tested  two  alternatives  to  estimate  the  overall  effect  of 
creating  a  braneh  with  Xj  =0(1): 

i)  Pj  =  uiPi)  alld  =  EEec,1  “>(<3), 

v)  pj  =  max{u>(Ci)  :  j  £  Cf}  and  /3j  =  max{^(C2)  :  j  €  C}}. 

Let  0j  denote  the  overall  effect  of  a  branching  on  variable  x3.  To  estimate  /?7,  we  need  to 
combine  0®  and  0* .  Inspired  by  currently  used  branching  rules,  we  suggest  and  test  the  following 
alternatives: 

0  Pj  =  niiii{ij,  1  -  Xj}  *{Pj  +  Pj), 

«)  Pj=P°j+P). 

in)  pj  =  inax{/3®,/3]}  +  10  *  min{/3®,/3]}. 

Note  that  the  first  alternative  considers  the  fraetionality  of  the  variable,  whereas  the  second  one 
simply  adds  the  individual  effects  and  the  third  one  is  similar  to  the  weights  used  in  strong  branch¬ 
ing- 

in  Table  3,  we  report  computational  results  for  a  set  of  difficult  instances  from  MIPLIB.  We 
provide  information  about  the  collection  phase  (solve  time,  number  of  elauses  collected  and  the 
average  size),  the  improvement  phase  (minimum,  average,  and  maximum  solve  time,  number  of 
clauses  improved,  average  size  after  improvement),  and  the  solve  phase  (number  of  nodes  and 
solution  time  CP  LEX,  information-based  search  with  basic  clauses  Basic ,  and  information-based 
search  with  improved  clauses  Improved.  If  we  eompare  the  node  eounts  for  CPLEX  and  Improved , 
we  see  that  in  all  of  the  instances  the  node  eounts  decrease  and  the  improvements  in  harder  problems 
are  more  significant  (for  mas74  reducing  from  1656970  nodes  to  1167900  nodes,  for  qiu  from  15862 
to  5139,  and  for  rout  from  45051  to  10933).  These  improvements  in  node  eounts  also  generally 
lead  to  improvements  in  solution  times. 


14 


TJ1 

a) 

o 

% 


p 

o 

£ 


<D 


CO 

JU 

2 

Eh 


15 


For  further  details  oil  the  approach  and  additional  computational  results,  see: 


F.  Kihng-Karzan,  G.L.  Nemhauser,  M.W.P.  Savelsbergh.  'Information  Based  Branching  Rules  for 
Binary  Mixed  Integer  Problems.”  Submitted  to  INFORMS  J.  on  Computing  (2009). 
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